Clustering UMLS Semantic Relations Between Medical Concepts
نویسنده
چکیده
We propose and implement an innovative semi-supervised framework for automatically discovering UMLS semantic relations. Our proposed framework uses semantic, syntactic and ortho-graphic features both at global level and local level. We experimented with multiple distance metric for clustering including Euclidean distance, spherical k-means distance, and Kullback-Leibler divergence. We show that with only 10% seeding, our feature set with KL-divergence achieves a 70.6% macro-averaged f-measure on level-1 UMLS semantic relation clustering and a 61.4% macro-averaged f-measure on level-2 UMLS semantic relation clustering. Our system can be used, with reasonably good accuracy and coverage, to explore the hierarchical structure of semantic relations in medical domain. A great part of human learning procedure is to acquire the knowledge about the relationships between entities and concepts. In real world, there are thousands of relations that would take humans years to learn. As this kind of knowledge is usually conveyed in the form of natural language , we can essentially establish a system that helps or mimics the human learning process to build such a relation knowledge database without human intervention, benefiting from recent advances in machine learning and natural language processing areas. Our work tries to demonstrate the possibility of such system, by experimenting on a controlled domain with relatively well-defined relations among the entities or concepts in that domain. Specifically , our work is to automatically harvest noun phrase pairs from corpus (PubMed abstracts (1) in our case) and to automatically cluster the pairs according to the semantic relations that hold in between. We choose PubMed abstracts as corpus as the core relationships between entities or concepts in PubMed are well-defined in UMLS semantic relation network (2). We use the UMLS TFA parser (3) to parse the abstracts and get chunked noun phrases, we then 3 use the Link Grammar Parser (4) to acquire the syntactic link among those chunked phrases. The reason that we prefer a first chunking then linking approach over a tree style parsing (such as Collins Parser (5) etc.) is because direct full parsing does not give types of links between noun phrases (although one can use intermediate nodes in the tree, tracing from one phrase to the other, as a substitute, they are not the genuine link types. In fact, every noun-phrase pair is " linked " through " ROOT " node, which makes it harder to distinguish different syntactic dependencies). We then harvest all the noun-phrase pairs …
منابع مشابه
طرح نقشه نمایی مفاهیم طبّ سنّتی ایران در ساختار ابراصطلاحنامه و شبکه معنایی«(UMLS) نظام زبان واحد پزشکی »
Introduction: This research was aimed to analyze mapping scheme of Traditional Iranian Medicine (TIM) with structure of common language of meta- thesaurus and Semantic network Unified Medical System Language (UMLS). The domain, location and relation of TIM in the UMLS is designed, and recitation of location and proportion of the TIM’s concepts are provided. Methods: This is a triphasic research...
متن کاملEnhancing Knowledge Representations by Ontological Relations
Several medical natural language processing (NLP) systems currently base on ontologies that provide the domain knowledge. But, relationships between concepts defined in ontologies as well as relations predefined in a semantic network are widely unused in this context. The objective of this paper is to analyse potentials of using ontological relations to produce correct semantic structures for a...
متن کاملExploiting UMLS Semantics for Checking Semantic Consistency among UMLS concepts
OBJECTIVES To quantify semantic inconsistency in UMLS concepts from the perspective of their hierarchical relations and to show through examples how semantically-inconsistent concepts can help reveal erroneous synonymy relations. METHODS Inconsistency is defined in reference to concepts from the UMLS Metathesaurus. Consistency is evaluated by comparing the semantic groups of the two concepts ...
متن کاملSemi-Supervised Learning to Identify UMLS Semantic Relations
The UMLS Semantic Network is constructed by experts and requires periodic expert review to update. We propose and implement a semi-supervised approach for automatically identifying UMLS semantic relations from narrative text in PubMed. Our method analyzes biomedical narrative text to collect semantic entity pairs, and extracts multiple semantic, syntactic and orthographic features for the colle...
متن کاملResearch Paper: Auditing the Unified Medical Language System with Semantic Methods
OBJECTIVE The National Library of Medicine's (NLM) Unified Medical Language System (UMLS) includes a Metathesaurus (Meta), which is a compilation of medical terms drawn from over 30 controlled vocabularies, and a Semantic Net, which contains the semantic types used to categorize Meta concepts and the semantic relations to connect them. Meta has been constructed through lexical matching techniqu...
متن کامل